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® A self-timed interface (STI) in which a clock 
signal clocks bit serial data onto a parallel, elec- 
trically conductive bus and the clock signal is trans- 
mitted on a separate line of the bus. The received 
data on each line of the bus is individualy phase 
aligned with the clock signal. The received clock 



signal is used to define boundary edges of a data bit 
cell individually for each line and the data on each 
line of the bus is individually phase adjusted so that, 
for example, a data transition position is in the center 
of the cell. 
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This invention relates to an improved method 
and apparatus for transmitting digital data at high 
speeds via a parallel data bus. and more particu- 
larly, to a method and apparatus that provides a 
cost effective short-haul interface for a wide variety 5 
of data transfer applications while eliminating pre- 
cise bus length and system clock rates as a critical 
or limiting factor in system design. 

The present United States patent application is 
related to the following co-pending United States ;o 
patent applications incorporated herein by refer- 
ence: 

Application Serial No. 262, 081, filed 06/17/94 (at- 
torney Docket No. PO9-93-053), entitled "Digital 
Phase Locked Loop with Improved Edge Detector," 15 
and assigned to the assignee of this application. 

Application Serial No. 261, 522, filed 06/17/94 
(attorney Docket No. PO9-93-056), entitled "Mul- 
tiple Processor Link," and assigned to the assignee 
of this application. 20 

Application Serial No. 261, 561, filed 06/17/94 
(attorney Docket No. PO9-93-057), entitled "En- 
hanced Input-Output Element," and assigned to the 
assignee of this application. 

Application Serial No. 261, 603, filed 06/17/94 25 
(attorney Docket No. PO9-93-058), entitled "Mas- 
sively Parallel System," and assigned to the as- 
signee of this application. 

Application Serial No. 261, 523, filed 06/17/94 
(attorney Docket No. PO9-93-059), entitled "At- 30 
tached Storage Media Link," and assigned to the 
assignee of this application. 

Application Serial No. 261, 641. filed 06/17/94 
(attorney Docket No. PO9-93-060), entitled "Shared 
Channel Subsystem," and assigned to the assign- 35 
ee of this application. 

As will be appreciated by those skilled in the 
art, such factors as noise and loading limit the 
useful length of parallel busses operating at high 
data rates. In the prior art, the length of the bus 40 
must be taken into account in the system design 
and the bus length must be precisely as specified. 
Manufacturing tolerances associated with physical 
communication link (chips, cables, card wiring, con- 
nectors, etc.) and temperature and variations in 45 
power supply voltage also limit the data rates on 
prior art busses comprised of parallel conductors. 
Further, many prior art computer systems transfer 
data synchronously with respect to a processor 
clock, so that a change in processor clock rate may 50 
require a redesign of the data transfer bus. 

An object of this invention is the provision of a 
cost effective bus data transfer system that can 
operate at high data transfer rates without tight 
control of the bus length, and without system clock 55 
constraints; a system in which the maximum bus 
length is limited only by the attenuation loss in the 
bus. 



Another object of the invention is the provision 
of a general purpose, low cost high performance, 
point to point data communication link where the 
width and speed of the interface can easily be 
modified to tailor it to specific bandwidth require- 
ments and to specific implementation technologies, 
including VLSI technologies. 

A further object of the invention is the provision 
of a bus data transfer system that operates a clock 
speed equal to the data rate or less than the data 
rate. 

A more specific object of the invention is the 
provision of a system that adjusts the phase or 
arrival time of the incoming data on the receive 
side so it can be optimally sampled by the local 
receive clock, compensating for many of the manu- 
facturing tolerances associated with the physical 
link (chip, cable, card wiring, connectors, etc.) as 
well as temperature changes and power supply 
output variations. 

Briefly, this invention contemplates the provi- 
sion of a self-timed interface (STI) in which a clock 
signal clocks bit serial data onto a parallel, elec- 
trically conductive bus and the clock signal is 
transmitted on a separate line of the bus. The 
received data on each line of the bus is individually 
phase aligned with the clock signal. The received 
clock signal is used to define boundary edges of a 
data bit cell individually for each line and the data 
on each line of the bus is individually phase ad- 
justed so that, for example, a clock transition posi- 
tion is in the center of the data cell. The data is 
written into a buffer using the received link clock 
and then read out synchronously with the receiver 
system clock. At the data rates contemplated in the 
application of this invention, the propagation delay 
is significant. However, within limits, the bus length 
is not critical and is independent of the transmit 
and received system clock. 

In one specified embodiment of the invention, 
data to be transmitted is transferred to a buffer 
synchronously with the transmitter system clock, 
which may or may not be the receiver system 
clock, A controller formats the data into packets for 
byte parallel, bit serial, transmission along-with 
headers specifically coded to provide unique data 
patterns that allow for correction of skew of up to 
three bit cells in addition to the initial phase adjust- 
ment. 

The foregoing and other objects, aspects and 
advantages will be better understood from the fol- 
lowing detailed description of a preferred embodi- 
ment of the invention with reference to the draw- 
ings, in which: 

Figure 1 is an overview block diagram illustrat- 
ing the application of a self-timed interface, in 
accordance with the teachings of this invention, 
to data communication among computer chips. 



3 



EP 0 687 982 A1 



4 



Figure 2 is a block diagram illustrating one em- 
bodiment of a transmitter serializer for imple- 
menting a self-timed interface in accordance 
with this invention. 

Figure 3 is a block diagram illustrating byte 
synchronization in accordance with the inven- 
tion. 

Figure 4 is a block diagram illustrating the next 
step in the byte synchronization process. 
Figure 5 illustrates phase alignment and sam- 
pling logic in accordance with a preferred em- 
bodiment of the invention. 
Referring now to Figure 1 of the drawings, it 
illustrates one embodiment in which a self-timed 
interface in accordance with the teachings of this 
invention can be used. This exemplary embodi- 
ment of the self-timed interface provides data com- 
munications between two microprocessor chips, 
labeled here as Chip A and Chip B. However, as 
will be apparent to those skilled in the art, the self- 
timed interface of this invention is applicable to 
provide data transfer between a wide variety of 
components or nodes. 

Chip A has a transmit port labeled 12A and 
Chip B has a transmit port labeled 12B. Similarly, 
Chips A and B have receive ports labeled 14A and 
14B, respectively. The ports are connected by two 
self-timed interface busses 16; one for each trans- 
mission direction. In this exemplary embodiment of 
the invention, each bus 16 is one byte wide, and 
comprised of nine electrical conductors; eight con- 
ductors for data and one conductor for a clock 
signal. 

Each transmit port (12A and 12B) includes a 
transmit logical macro 18 that provides a logical 
interface between the host logic and the self-timed 
interface link 16. Sync buffers 22 provide an inter- 
face between the host clock and the self-timed 
interface clock. This allows the self-timed interface 
link to run at a predetermined cycle time that is 
independent of the host clock, making the self- 
timed interface link independent of the host. An 
outbound physical macro 24 serializes a word-wide 
data flow into a byte-wide data flow that is transmit- 
ted along with the clock on the self-timed interface 
link 16. 

Each receive port (i.e., 14A and 14B) includes 
an inbound physical macro 26 that first dynamically 
aligns each data bit with the self-timed interface 
clock signal. It aligns any bits with skew up to three 
bit cells and deserializes the bytes into words. A 
receive logical macro 28 provides an interface be- 
tween the self-timed interface receiver logic and 
the host logic and generates link acknowledge sig- 
nals and link reject signals, which are coupled by 
internal links 33 and transmitted back to the trans- 
mitting port via an outbound self-timed interface 
link 16. In order to compensate for variations in 



electrical path delay, the phase of the incoming 
data is adjusted, or self-timed. Each bit (line) is 
individually phase aligned to the transmitted refer- 
ence clock and further aligned to compensate, 
5 within embodiment, for up to three bit cells of skew 
between any two data lines. The self-timing opera- 
tion has three parts. The first is to acquire bit 
synchronization; the second is byte/word align- 
ment; and the third is maintaining synchronization. 
10 In acquiring bit synchronization, the link takes 

itself from a completely untimed state into synchro- 
nous operation. Any previous condition on the STI 
interface or logic is disregarded with a complete 
logic reset. The bit synchronization process can be 
is rapidly established, for example on the order of 
200 microseconds. The phase of the incoming data 
is manipulated on a per line basis until the data 
valid window or bit interval is located. This is 
accomplished using a phase detector that locates 
20 an average edge position on the incoming data 
relative to the local clock. Using two phase detec- 
tors one can locate two consecutive edges on data 
and these two consecutive edges define the bit 
interval or data valid window. The data to be sam- 
25 pled by the local clock is the phase of the data 
located halfway between the two edges of the data. 

Byte alignment takes place by manipulating the 
serial data stream in whole bit times to properly 
adjust the byte position relative to a deserializer 
30 output. Referring now to Figure 4, word alignment 
takes place next by manipulating the deserializer 
data four bit intervals at a time to ensure proper 
word synchronization on the STI interface. A timing 
sequence allows proper bit, byte and word syn- 
35 chronization. 

Synchronization maintenance occurs as part of 
the link operation in response to temperature and 
power supply variations. 

Referring now to Figure 2, which illustrates one 
40 embodiment of a transmit serializer for a bit serial 
byte parallel interface used in the practice of the 
invention. Here a four byte wide data register 23 
receives parallel inputs 25 (bytes 0, 1, 2 and 3 
inputs shown here) and multiplexers 19 and 2:1 
45 selector 27 multiplex the register output to a one 
byte wide output of off chip driver 15 coupled to a 
self-timed interface bus. Data is clocked from the 
register 23 by divide-by-two logic 12 whose input 
is self-timed interface clock signal on line 27. Bit 
so zero from bytes 0, 1, 2 and 3 are serialized and 
transmitted on link 0 of the self-timed interface, 
shown here. Bit 1 from bytes 0, 1 , 2 and 3 will be 
transmitted on link 1 (not shown) and so on. 

To minimize the bandwidth requirements of the 
55 communication media the STI clock is one half the 
frequency of the transmitted data (baud) rate, i.e., a 
75 Mhz clock will be used for a 150 Mbit/S data 
rate. The clock will be generated from an STI 
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oscillator source, this is done to decouple the sys- 
tem or host clock from the STI link. The data will 
be transmitted with both edges of the clock. 

Referring now to Figure 3, assuming a bit syn- 
chronization process as described in connection 
with 5 has been completed, byte synchronization 
starts by coupling the phase aligned data (now 2 
bits wide) into shift registers 33 whose outputs are 
coupled to multiplexer 35. Control inputs 37 to the 
multiplexer are used to deskew the particular data 
line from the other data lines by whole bit times. 
The deserializer data output for a particular data 
line is monitored for an expected timing pattern 
(e.g., X 0 1 0 where X is a don't care) to determine 
the proper order of the received data. If at any time 
a zero is detected in the bit 3 position, the mul- 
tiplexer is incremented thus moving the byte 
boundary by one bit time. This process is repeated 
until the proper byte boundary is located. The 
multiplexer control wraps around from a binary 3 to 
a binary 0 in case the correct position was in- 
correctly passed through the previous time. This 
function allows synchronization of data lines skew- 
ed by more than an entire it time. 

Finally word alignment takes place. Referring 
now to Figure 4, word alignment is established by 
manipulating the deserializer output bus four bits at 
a time until word synchronization is established. 
Note that the first register is shifted by four bit 
times relative to the second register. Four bit times 
is the maximum any data bit can be skewed rela- 
tive to another data bit (3 bit times on link + 1 bit 
time from phase alignment section). 

As will be appreciated by those skilled in the 
art, any of a number of circuits, such as a digital 
phase lock loop, can be used as the self-timer 52 
to provide individual phase synchronization be- 
tween the clock and the data. However, in a pre- 
. ferred embodiment of the invention, the novel edge 
detector disclosed in co-pending application Serial 
No. 262, 087 filed 06/17/94 , and assigned to the 
assignee of this application, and incorporated here- 
in by reference. 

Referring now to Figure 5, in this embodiment 
of the invention, the clock rate is the same as the 
data rate. The data edges that define a data win- 
dow are each detected independently of the other 
and the data is sampled at the midpoint between 
the edges when the edges have been aligned with 
the clock. The position of the edges of increment- 
ally separated phases of the input data stream are 
successively compared to the position of the rising 
and falling edges of the clock in order to locate the 
edges of the data stream with respect to both 
edges of the clock (e.g., the rising and falling 
edges). 

The data phase pairs are generated in this 
specific embodiment of the invention by three in- 
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crementally selectable delay elements 80, 82, and 
84. For example, the elements 80 and 82 provide 
delays, respectively, in 1/1 0th and 1/5th bit time 
increments and element 84 provides fine incre- 
s ments on the order of 1/20th of a bit time. The fine 
delay element 84 is separated into three groups to 
provide early edge detection, system data detec- 
tion, and late edge detection. An early guard band 
selector 86 successively selects one phase of the 
to data stream to provide an "early" phase of the 
incrementally separated phases - one for the rising 
edge and one for the falling edge. Similarly, a late 
guard band selector 90 successively selects one 
phase of the data stream to provide a "late" phase 
75 of the incremental phases - again one for the rising 
edge and one for the falling edge. A selector 88 
selects incremental phases for the mid-cell system 
data position. 

A selected data phase is coupled as an input 
20 to master-slave RES-FES latch pairs 92, 94, and 
96. The rising edge data samples are clocked into 
the RES latches and the falling edge data samples 
are clocked into the FES latches. The outputs of 
the RES-FES latch pair 92 are connected to an 
25 early edge detector 98. Similarly, the outputs of the 
RES-FES latch pair 96 are coupled to a late edge 
detector 100. The RES latch of pair 94 is coupled 
to the early edge detector 98 and the FES latch of 
pair 94 is coupled to the late edge detector 100. 
30 Each edge detector (98 and 100) outputs a 

"lead", a "lag" or a "do nothing" output which 
indicates the location of a data edge with respect 
to the reference clock edge location. The output of 
each edge detector is coupled via a suitable filter 
35 102 (i.e., a random walk filter), back to its respec- 
tive selector 86 and selector 90, respectively. The 
selectors shift the phase of the data coupled to the 
RES-FES latches in the direction indicated, or if 
"do nothing" is indicated, the phase of the data at 
40 that edge is not shifted. 

Data control logic 104 controls the system data 
output by selecting the phase of the data that is 
halfway between the two data edges when the data 
edges are aligned with the reference clock. A 
45 phase of the data (Data 1 and Data 2) is outputted 
at each reference clock edge. 

In operation of a specific embodiment, at pow- 
er on the logic will automatically begin the bit 
synchronization process. A 16 microsecond timer 
so is started, the bulk delays are reset to their mini- 
mum delay and a 16 bit counter running off the 
divided down clock is started. The edge detect 
circuitry will sample the incoming data with the 
received reference clock. The edge detector will 
55 output a "lead", a "lag" or a "do nothing" signal 
that indicates the data edge location relative to the 
reference clock. This signal is filtered by a Random 
Walk Filter (RWF) and fed-back to the selectors of 
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their respective RES and FES circuits. The selec- 
tors shift the phase of the data into the RES and 
FES as indicated by the edge detector. Each edge 
detector operates independently of the other. Each 
will locate the transitions on data relative to the s 
received (ref) clock by manipulating the incoming 
phase of the data into the edge detector as de- 
scribed above. The phase of the system data is 
controlled by the data control logic which selects 
the phase of the data halfway between the two w 
edge detectors. In parallel with the bit synchroniza- 
tion process, the order of bits out of the de- 
serializer are manipulated to the correct order (see 
byte/word synchronizaton below). When the 16 
microsecond timer trips the algorithm resets a de- 75 
serializer error latch and restarts the 16 micro- 
second counter. The deserializer output is com- 
pared against the expected timing pattern (X 0 1 0 
where X is a don't care). A single miscompare on 
any cycle during the next 16 microseconds will set 20 
the deserializer error latch. When the 16 micro- 
second counter trips again the algorithm checks 
the addresses of the EGB, LGB, and data selec- 
tors, deserializer error latch. In order for a bit to 
end the initial bit synchronization search state, the 25 
deserializer output latch must have remained reset 
AND the all selectors must be properly centered in 
their tracking range (centering ensures that adjust- 
ments can be made to allow for the tracking of 
temp, and power supply variations after the initial 30 
bit synchronization process). If both conditions are 
not met then the algorithm adds a bulk delay 
element, resets the 16 microsecond counter and 
the search process begins once again. Each and 
every bit (data line) on the STI interface undergoes 35 
this process in parallel. Once an individual data line 
is determined to meet the initial bit synchronization 
criteria described above it is de-gated while the 
other lines continue to be adjusted. The bit syn- 
chronization process is complete once all bits are ao 
adjusted and meet the search criteria. The logic 
will not exit the bit synchronization mode until the 
1 6 bit counter trips. 

During normal operation the physical macro 
will continuously monitor the incoming data to en- 45 
sure that the optimum clock sampling relationship 
exists- Small updates will be made to track tem- 
perature, power supply and data jitter. These up- 
dates will be seamless and transparent to the host 
logic. Approximately 1/2 a bit time of delay will be 50 
needed to compensate for temperature and power 
supply variations to maintain proper synchroniza- 
tion. This added delay is in the fine delay elements 
section. There is also circuitry to monitor the posi- 
tion of the guard bands relative to the allowable 55 
range of operation. If a guard band reaches the end 
of its range, two cases exists: 1) a new bulk delay 
element is added and the fine delay elements are 



adjusted accordingly. Note this can cause sampling 
errors in the data. The circuitry that makes these 
on the fly bulk adjustments can be inhibited so no 
on the fly bulk delay adjustments are made during 
normal operation. The second case exists when 
one of the guard bands reaches the end of its 
range and the on the fly bulk delay adjustment is 
inhibited, the physical macro will signal the logical 
STI macro that a bit synchronization is required 
soon. The link should finish the immediate work 
and force the link into timing mode. 

While the invention has been described in 
terms of a single preferred embodiment, those 
skilled in the art will recognize that the invention 
can be practiced with modification within the spirit 
and scope of the appended claims. For example, 
while the invention has been illustrated with the 
data stream delayed relative to the clock, the same 
results can be obtained by generating multiple 
phases of the clock relative to the data stream. 

Claims 

1. A self-timed communications interface for 
transmitting digital data between a first node 
and a second node over a plurality of digital 
data lines and a clock signal line, comprising 
in combination: 
said first node including; 
a digital data buffer; 

means for generating a communications clock 
signal; 

means responsive to said communications 
clock signal for coupling digital data from said 
digital data buffer to said plurality of digital 
data lines synchronously with said communica- 
tions clock signal; and 

means to couple said communications clock 
signal to said clock signal line; 
said second node including; 
means for receiving said digital data signal 
coupled to said plurality digital data lines; 
means for receiving said communications 
clock signal coupled to said communications 
clock signal line; 

comparing means coupled to said means for 
receiving said digital data signal and said 
means for receiving said communications 
clock signal; 

said comparing means comparing a phase of 
said communications clock signal with a phase 
of said digital data signal coupled to each of 
said plurality of digital data lines, and 
means coupled to said comparing means to 
adjust the phase of said digital data signal 
coupled to each of said plurality of digital data 
lines independently relative to said commu- 
nications clock signal in order to bring into 
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phase synchronism said digital data signal 
coupled to each of said plurality of data lines 
and said communications clock signal. 

2. A method for transmitting digital data between 5 
a first node and a second node comprising the 
steps of: 

transmitting said digital data on a plurality of 
transmission lines, which connect said first 
node and said second node, synchronously 10 
with a digital clock signal; 
transmitting said digital clock signal on one 
transmission line, which connects said first 
node and said second node; 

receiving said digital data and said digital clock 15 
signal; 

aligning the phase of said digital data on each 
of said plurality of transmission lines with said 
digital clock signal received in said receiving 
step. 20 

3. A self-timed communications interface as in 
claim 1 wherein said comparing means in- 
cludes means for aligning an edge of said 
communications clock signal with one edge of 25 
said digital data signal. 

4. A self-timed communications interface as in 
claim 1 wherein said comparing means in- 
cludes means for aligning both edges of said 30 
communications clock signal with said digital 
data signal. 

5. A self-timed communications interface as in 
claim 1 further including means for correcting 35 
skew in data bits on said lines where the bits 
have been phase aligned with said clock sig- 
nal. 

>. A self-timed communications interface as in 40 
claim 5 where said means for correcting skew 
corrects skew up to three bit positions. 

\ A method for transmitting digital data as in 

claim 2 wherein said aligning step aligns the 45 
phase of the digital data with both edges of the 
clock signal. 

L A method for transmitting digital data as in 

claim 2 including the further step of correcting 50 
skew in data bits on said lines where the bits 
have been phase aligned with said clock sig- 
nal. 
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